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Abstract — In data mining, association rule learning is a popular 
and well researched method for discovering interesting relations 
between variables in large databases. It is intended to identify 
strong rules discovered in databases using different measures of 
interestingness. Introduced association rules for discovering 
regularities between products in large-scale transaction data 
recorded by point-of-sale systems in supermarkets. In this paper 
we present a novel Q-based FP tree technique that greatly 
reduces the need to traverse FP-trees and Q based FP tree, thus 
obtaining significantly improved performance for FP-tree based 
algorithms. The technique works especially well for sparse 
datasets. We then present a new algorithm which use the Q FP- 
tree data structure in combination with the FP- Experimental 
results show that the new algorithm outperform other algorithm 
in not only the speed of algorithms, but also their CPU 
consumption and their scalability. 
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I. Introduction 

The problem for association rules mining from a data 
stream has been addressed by many authors but there are 
several issues (as highlighted in previous sections) that hang 
about to be addressed. In this part we address the literature 
review of data stream mining. The work in this domain can be 
effectively classified into three different domains namely, exact 
methods for Frequent Item set Mining, Approximate Methods 
and Memory Management techniques adopted for data stream 
mining [1,2]. 

Let 1= {il, i2, in} be a set of items, we call x and I an item 
set, and we call X a k-item set if the cardinality of item set X is 
k. Let database T be a multi set of subsets of I, and let 
support(X) be the percentage of item set Y in T such that X U 
Y .Informally, the support of an item set procedures how often 
X occurs in the database. If support(X) + minus , we say that X 
is a frequent item set , and we denote the set of all frequent 
item sets by FLA closed frequent item set is a frequent item set 
X such that there exists no superset of X with the same support 
count as X. If X is frequent and no superset of X is frequent, 
we say that X is a maximal frequent item set, and we denote 
the set of all maximal frequent item sets by MFI. [7] . 

This is the inherent cost of candidate generation, no matter 
what implementation technique is applied. It is tedious to 
repeatedly scan the database and check a large set of candidates 
by pattern matching, which is especially true for mining long 
patterns. Can one develop a method that may avoid candidate 
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generation-and-test and utilize some novel data structures to 
reduce the cost in frequent-pattern mining? This is the 
motivation of this study [6]. 

II. RELATED WORK 

In the aforementioned FP -growth method [2], a novel data 
structure, the FP-tree (Frequent Pattern tree) is used. The FP- 
tree is a compact data structure for storing all necessary 
information about frequent item sets in a database. Every 
branch of the FP-tree represents a frequent item set, and the 
nodes along the branch are ordered decreasingly by the 
frequency of the corresponding item, with leaves representing 
the least frequent items. Each node in the FP-tree has three 
fields: item-name, count and node-link, when item-name 
registers which item this node represents, count registers the 
number of transactions represented by the portion for the path 
reaching this node, and node-link links to the next node in the 
FP-tree carrying the same item-name, or null if there is none. 
The FP-tree has a header table associated with it. Single items 
are stored in the header table in decreasing order of frequency. 
Each entry in the header table consists of two fields, item-name 
and head of node-link (a pointer pointing to the first node in the 
FP-tree carrying the item-name). Compared with Apriority [1] 
and its variants which need several database scans, the FP- 
growth method only needs two database scans when mining all 
frequent item sets. In the first scan, all frequent items are 
found. The second scan constructs the first FP-tree which 
contains all frequency information of the original dataset. 
Mining the database then becomes mining the FP-tree. Figure 
2(a) shows a database example. After the first scan, all frequent 
items are inserted in the header table of an initial FP-tree. 
Figure 2(b) shows the first FP-tree constructed from the second 
scan. The FP -growth method relies on the following principle: 
if X and Y are two item sets, the support of item set X UY in 
the database is exactly that of Y in the restriction of the 
database to those transactions containing X. This restriction of 
the database is called the conditional pattern base of X. Given 
an item in the header table, the growth method constructs a 
new FP-tree corresponding to the frequency information in the 
sub-dataset of only those transactions that contain the given 
item. The complete set of frequent item sets is generated from 
all single path FP-trees[5]. UFP-growth algorithm was 
extended from the FP-growth algorithm which is one of the 
most well-known pattern mining algorithms in deterministic 
databases. Similar to the traditional FP-growth algorithm, UFP- 
growth algorithm also firstly builds an index tree, called UFP- 
tree to store all information of the uncertain database. 
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Then,based on the UFP-tree, the algorithm recursively builds 
conditional subtrees and finds expected support-based frequent 
itemsets. Figure 1 when min esup=0.25. 
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Figure 3. Frequency of Sample Database 

III. PROPOSED WORK 

Algorithm of WS with Array based technique: Improved 
FP-tree (IFP-tree) is similar with FP-tree and each node in IFP- 
tree consists of four fields: item, count, ahead and next. Where 
item registers which item this node represents, count registers 
the number of transactions represented by the portion of the 
path reaching this node, ahead links to the left child or the 



parent of the node, and next links to the right brother of the 
node or the next node in IFP-tree carrying the same item, or 
null if there is none. We also define two arrays: nodecnt and 
link, and link [item] registers a pointer which points to the first 
node in the IFP-tree carrying this item, nodecnt [item] registers 
the support count sum of those nodes in IFP-tree which carry 
the same item. In comparison with FP-tree, IFP-tree doesn't 
contain the path from root to leaf-node, contains fewer pointers 
than FP-tree in mining process, and so may greatly save cost in 
memory. The construction method of IFP-tree is similar with 
that of FP-tree, the difference from FP-tree exits in the process 
of inserting frequent item sets in each transaction into IFP-tree. 
In this paper, we don't adopt the method of recursively 
performing the procedure, insert tree ([pIP], t), but employ a 
dynamic pointer to complete it. 

A. The algorithm constructing IFP-tree as follows: 
Procedure FP-tree constructs (T, min_sup) 

1) Scan T and count the support of each item, derive a frequent item 
set (F) and a list (L) of frequent items, 

in which items are ordered in frequency-descending order; 

2) The root of IFP-tree is created and labeled with "root"; 

3) For each transaction t UT do 
I 

Frequent item set It= t UF, in which items are listed to St according 
to the order of L, defines a dynamic 

pointer (p_current) which points to root. 

Procedure WSFP-tree constructs (T, min_sup) 

1) Scan T and count the support of each item, derive a frequent item 
set (F) and a list (L) of frequent items, 

in which items are in sequence of occurrence form; 

2) The root of IFP-tree is created and labeled with "root"; 

3) For each transaction t UT do 



Frequent item set It= t UF, in which items are listed to St according 
to the order of occurrence L, defines a dynamic pointer (p_current) 
which points to root. 

4 (Traverse IFP-tree in a root-first order and transfer the pointers of 
ahead and next, count the sum of nodes' support carrying the same 
item and then list together. 




Figure 4. FP Tree constructions 
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Figure 5. WSFP Tree construction 

IV. EXPERIMENTAL EVALUATION 

The experiments were conducted on 2.4 GHz Pentium with 
512 MB of memory running Microsoft Windows XP. All codes 
were compiled using Matlab 7.10. We used Connect -4 
downloaded form a website to test and compared FP tree with 
WSFP tree, which is a real and dense dataset. Fig 6 and Fig 7 
shows the experimental results. Here we can see that ABWSFP 
outperforms WSFP for high levels of minimum support, but it 
is slow for very low levels. 




Figure 6. Graphical Representation of Calculated Result 
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Figure 7. CPU Utilization 

V. CONCLUSIONS 

In this paper, an efficient algorithm, called mining frequent 
pattern, for mining maximal frequent patterns based on 



improved FP-tree is proposed, the algorithm improves the 
conventional FP-tree and by introducing the concept of the 
array sub-tree, avoids generating the maximal frequent 
candidate patterns in mining process and therefore greatly 
reduces the memory consume, it also uses an array-based 
technique to reduce the traverse time to the improved FP-tree. 
Therefore it greatly improves the mining efficiency in time and 
space scalability. Experimental results show that it possesses 
high mining efficiency and scalability. 
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